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Abstract 


I  have  previously  described  psychophysical  experiments  that  involved 
the  perception  of  many  transparent  layers,  corresponding  to  multiple 
matching,  in  doubly  ambiguous  random  dot  stereograms.  Additional 
experiments  are  described  in  the  first  part  of  this  paper.  In  one  ex¬ 
periment,  subjects  were  required  to  report  the  density  of  dots  on  each 
transparent  layer.  In  another  experiment,  the  minimal  density  of  dots 
on  each  layer,  which  is  required  for  the  subjects  to  perceive  it  as  a  dis¬ 
tinct  transparent  layer,  was  measured.  The  difficulties  encountered  by 
stereo  matching  algorithms,  when  applied  to  doubly  ambiguous  stere¬ 
ograms,  axe  described  in  the  second  part  of  this  paper.  Algorithms  that 
can  be  modified  to  perform  consistently  with  human  perception,  and 
the  constraints  imposed  on  their  parameters  by  human  perception,  are 
discussed. 
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1  Introduction 


The  depth  of  3D  objects  is  lost  in  the  optical  projection  process.  Stereo  vision, 
in  which  two  simultaneous  images  of  the  same  scene  are  recorded  in  the  two 
eyes,  can  be  used  to  recover  the  lost  depth.  In  computational  stereo  algorithms, 
the  extraction  of  depth  from  binocular  stereo  begins  with  the  formation  of 
a  disparity  map  by  matching  the  two  images  (the  disparity  of  an  object  is 
defined  as  the  difference  between  its  positions  in  the  two  images).  Thus,  a 
disparity  value  is  assigned  to  ev<  ry  location  in  the  image.  In  order  to  solve  the 
matching  ambiguity  at  each  feature  in  the  image,  neighboring  features  can  be 
used.  It  is  generally  assumed  that  many  neighboring  features  should  have  a 
match  at  about  the  same  disparity  for  a  matching  to  be  plausible.  Different 
stereo  matching  algorithms  differ  in  how  they  implement  this  neighborhood 
interaction  (or  smoothness  constraint),  among  other  things. 

I  have  previously  described  [9,  10]  psychophysical  experiments  whose  re¬ 
sults  could  not  be  readily  explained  by  existing  stereo  matching  algorithms. 
In  these  experiments,  subjects  were  presented  with  doubly  ambiguous  stere¬ 
ograms  (defined  in  section  2.1).  In  some  cases  a  few  transparent  surfaces  were 
perceived  corresponding  to  multiple  matches,  in  other  cases  transparent  sur¬ 
faces  corresponding  to  unique  matches  were  perceived.  Some  stereograms  were 
constructed  to  have  the  same  cross-correlation  between  the  left  and  right  im¬ 
ages,  yet  different  numbers  of  transparent  layers  were  perceived.  The  results 
of  these  experiments  are  briefly  summarized  in  section  2. 

In  section  3,  additional  experiments  are  described.  First,  subjects  were 
required  to  report  the  density  of  dots  on  each  transparent  layer  of  a  doubly 
ambiguous  stereogram  by  adjusting  the  density  of  dots  on  three  simple  trans¬ 
parent  layers.  In  another  experiment,  the  minimal  density  of  dots  on  each 
layer,  which  is  required  for  the  subjects  to  perceive  it  as  a  distinct  transparent 
layer,  was  measured.  These  experiments  were  designed  to  clarify  which  algo¬ 
rithmic  principle  can  be  used  to  explain  the  results  in  the  experiments  with 
doubly  ambiguous  stereograms. 

In  section  4,  the  difficulties  encountered  by  stereo  matching  algorithms, 
when  applied  to  doubly  ambiguous  stereograms,  are  discussed.  Two  simple 
matching  algorithms,  representing  two  different  simple  matching  principles, 
are  discussed  in  detail:  a  patch-wise  correlation  algorithm  (e.g.  [5,  2]),  and 
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Prazdny’s  matching  algorithm  [8j.  For  comparison  with  human  data,  an  ad¬ 
ditional  stage  was  added  to  each  algorithm,  where  the  matching  results  were 
used  to  determine  how  many  transparent  layers  exist  in  the  image.  The  range 
of  parameters  for  which  the  performance  of  these  algorithms  was  consistent 
with  humans,  and  the  sensitivity  of  their  tuning,  is  discussed. 
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2  Multiple  matching  in  ambiguous  stereograms: 

2.1  Doubly  ambiguous  stereograms 

In  a  doubly  ambiguous  Random  Dot  Stereogram,  a  sparse  random  pattern  (fig¬ 
ure  lb)  is  copied  twice  in  each  image  (figure  la).  The  horizontal  gap  between 
the  two  copies  is  Gr  pixels  in  the  right  image  and  Gi  in  the  left  image.  Each 
dot  of  the  original  sparse  random  pattern  (figure  lb)  has  two  copies  in  each 
image.  All  these  pairs,  which  are  the  micropattern  of  the  doubly  ambiguous 
RDS  (figure  lc),  are  the  same  instance  of  the  double  nail  illusion  stimulus  [4]. 
There  are  four  possible  matchings  of  the  elements  of  the  micropattern  that  are 
equally  plausible,  two  mutually  exclusive  pairs  if  matching  is  unique,  namely, 
a  point  can  only  be  matched  to  a  single  point  in  the  other  image  (full  and 
hollow  circles  in  figure  lc). 

2.2  Summary  of  previous  results 

In  an  unpublished  work,  Braddick  presented  subjects  with  ambiguous  stere¬ 
ograms  that  were,  in  effect,  a  special  case  of  the  doubly  ambiguous  stereograms 
described  in  section  2.1.  In  these  stereograms,  the  generating  pattern  was 
copied  twice  only  in  one  image,  equivalent  to  choosing  G>  =  0  or  G/  =  0.  The 
micropattern  of  such  a  stereogram,  one  dot  in  one  image  and  two  dots  in  the 
other  image,  is  also  known  as  Panum’s  limiting  case.  When  presented  with 
Panum’s  limiting  case,  subjects’  perception  corresponds  to  matching  the  single 
dot  in  one  image  to  both  dots  in  the  other  image  (if  the  distance  between  the 
dots  is  within  Panum’s  limiting  area)1.  When  viewing  a  stereogram  composed 
of  such  micropatterns,  the  perception  was  similar:  subjects  reported  seeing  two 
transparent  surfaces,  corresponding  to  a  multiple  matching  of  the  generating 
pattern. 


2.3  Multiple  matching 

In  the  first  experiment,  an  ambiguous  stereogram  of  the  type  described  in 
section  2.1,  with  Gr  ^  Gt  and  dot  density  of  9%  (of  the  generating  pattern), 
was  used.  Subjects  identified  up  to  four  transparent  layers,  corresponding  to 
all  four  possible  matches  of  the  micropattern  dots  (figure  lc).  The  differences 
between  the  transparent  layers  were  approximately  6  minutes  of  arc.  The 

'This  is  different  from  their  perception  when  presented  with  tb»  mic’-opattern  of  a  doubly 
ambiguous  stereogram,  as  will  be  discussed  in  section  3.3 
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b)  c) 


Figure  1:  a)  A  doubly  ambiguous  random  dot  stereogram,  b )  The  sparse 
random  pattern  that  is  used  to  generate  the  doubly  ambiguous  stereogram  in 
a.  For  illustration  purposes,  the  density  of  the  sparse  pattern  is  reduced;  in 
the  actual  experiment  it  was  equal  to  the  density  of  the  background,  c)  The 
enlarged  micropattern  of  the  RDS  in  a,  where  the  two  pairs  of  matches  that 
are  mutually  exclusive  if  matching  is  unique  are  separately  marked  by  filled 
and  hollow  circles. 
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smaller  the  differences  were,  the  easier  it  was  to  see  the  layers  simultaneously, 
but  the  harder  it  was  to  distinguish  them  in  depth. 

The  correlation  function  between  the  left  and  right  images  of  a  stereogram 
of  the  type  used  in  this  experiment  is  given  in  figure  2a.  There  are  four  peaks 
in  this  function,  corresponding  to  the  disparities  of  the  four  transparent  layers 
that  were  seen.  This  result  seems  to  suggest  that  all  peaks  in  the  correlation 
function  give  rise  to  the  perception  of  distinct  transparent  layers. 


Figure  2:  The  correlation  (as  a  function  of  disparity)  between  the  left  and  the 
right  images  of  doubly  ambiguous  RDS’s.  The  correlation  window  was  equal 
in  size  to  the  generating  pattern. 


2.4  Unique  matching 

In  the  second  experiment,  an  ambiguous  stereogram  of  the  type  described  in 
section  2.1,  with  Gr  =  Gi ,  was  used.  Two  of  the  four  possible  disparities  of  the 
micropattern  (figure  lc)  are  identical  and  therefore  the  correlation  between  the 
left  and  right  images  (figure  2b)  has  only  three  peaks.  The  conclusion  given  in 
the  previous  section  predicts  that  three  transparent  layers  will  be  identified  in 
this  case,  corresponding  to  the  three  peaks.  However,  subjects  identified  only 
one  opaque  surface,  whose  disparity  corresponded  to  the  maximum  correlation 
in  figure  2b.  Thus  not  all  the  local  maxima  in  the  correlation  function  give 
rise  to  the  perception  of  distinct  transparent  layers. 
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a)  b) 


Figure  3:  The  correlation  (as  a  function  of  disparity)  between  the  left  and  the 
right  images  of  doubly  ambiguous  RDS’s. 

2.5  The  information  in  the  correlation  function 

There  is  one  difference  between  the  correlation  functions  plotted  in  figures  2a, b 
that  may  explain  the  difference  in  human  perception.  The  disparity  of  the  sin¬ 
gle  opaque  plane  perceived  in  the  second  experiment  corresponds  to  the  global 
maximum  of  the  correlation  function  in  figure  2b,  whereas  the  correlation  func¬ 
tion  in  figure  2a  has  four  identical  maxima.  Additional  experiments  showed, 
however,  that  this  difference  cannot  explain  human  perception. 

In  one  experiment,  two  stereograms,  whose  correlation  functions  are  given 
in  figures  3a, b  respectively,  were  presented  to  subjects.  The  stereogram  corre¬ 
sponding  to  figure  3a  was  similar  to  the  stereogram  described  in  section  2.3, 
with  additional  dots  that  could  all  be  matched  at  disparity  -2.  In  this  case 
subjects  identified  up  to  four  transparent  layers.  The  stereogram  correspond¬ 
ing  to  figure  3b  was  similar  to  the  stereogram  described  in  section  2.4,  with 
additional  dots  that  could  all  be  matched  at  disparity  4.  In  this  case  subjects 
identified  only  two  transparent  layers.  These  experiments  show  that  the  cor¬ 
relation  between  the  left  and  right  images,  when  computed  over  a  large  region 
around  a  point  (of  an  order  of  magnitude  of  the  whole  image),  cannot  account 
for  the  subtleties  of  human  perception. 
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3  Additional  experiments 

The  following  experiments  were  designed  to  help  clarifying  which  kinds  of 
stereo  matching  algorithmic  principles  can  more  readily  explain  the  results  in 
the  experiments  described  in  section  2. 

3.1  Experiment  1:  the  density  of  the  transparent  lay¬ 
ers 

In  the  experiment  described  in  section  2.3,  most  subjects  identified  three  to 
four  transparent  layers.  This  perception  corresponds  to  multiple  matching  of 
the  generating  sparse  pattern  of  the  stereogram,  which  can  be  matched  as  a 
whole  to  one  of  its  copies  in  the  second  image  with  a  single  disparity,  leading 
to  the  simultaneous  perception  of  only  two  transparent  layers.  This  multiple 
matching  of  the  generating  pattern  can  be  implemented  by  the  assignment  of 
a  unique  disparity  to  each  dot  in  the  generating  pattern,  where  some  of  the 
dots  are  assigned  one  disparity  and  the  rest  are  assigned  another  disparity. 
Alternatively,  it  may  be  that  multiple  matches  are  assigned  to  each  dot  in  the 
generating  pattern,  since  there  are  at  each  dot  two  disparities  that  are  equally 
supported  by  neighboring  dots.  Both  solutions  of  the  matching  problem  lead 
to  the  identification  of  four  transparent  layers. 

The  present  experiment  was  designed  to  choose  one  of  these  two  explana¬ 
tions  for  the  multiple  matching  results. 

Methods: 

Five  subjects  participated  in  this  experiment.  They  were  presented  with  one  of 
the  doubly  ambiguous  stereograms  described  in  section  2.3.  Adjacent  to  this 
stereogram,  another  stereogram  of  three  transparent  layers  was  presented2, 
where  the  height  of  the  three  transparent  layers  matched  the  height  of  the  top 
three  ambiguous  layers  of  the  ambiguous  RDS.  The  subjects  were  asked  to 
modify  the  density  of  dots  on  each  layer  of  the  second  (adjacent)  stereogram, 
in  steps  of  1%  or  4%,  until  the  density  of  each  transparent  layer  matched  the 
density  of  the  corresponding  ambiguous  transparent  layer. 

Density  matching  of  transparent  layers  was  initially  quite  difficult.  The 
subjects  started  with  two  training  sessions.  In  the  first  session  they  were 
asked  to  match  the  density  of  a  single  opaque  layer  to  another  single  opaque 

2  In  this  stereogram  each  dot  could  be  matched  to  any  other  dot  in  the  image,  the  “usual’1 
ambiguity,  but  the  additional  ambiguity  created  by  doubling  a  certain  generating  pattern 
as  described  in  section  2.1  was  eliminated. 
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layer.  They  received  feedback,  and  ended  this  session  when  their  matching  was 
perfect.  This  session  proved  to  be  quite  easy  to  everyone.  In  the  second  session 
of  the  training,  the  subjects  had  to  match  the  densities  of  three  transparent 
layers  to  the  densities  of  three  other  transparent  layers.  This  task  was  initially 
quite  hard,  but  after  a  few  trials  and  feedback,  the  subjects  had  learned  to  do 
this  task  and  felt  quite  confident  at  being  able  to  do  it  well.  They  stopped 
the  second  training  session  when  the  error  in  density  matching  per  layer  was 
smaller  or  equal  to  1%.  One  subject  could  not  obtain  this  level  of  performance. 

After  finishing  the  two  training  sessions,  subjects  were  presented  with  two 
doubly  ambiguous  RDS‘s  of  the  type  described  in  section  2.3,  where  the  den¬ 
sities  of  dots  on  the  generating  sparse  pattern  were  9%  and  11%  respectively. 
They  had  to  match  the  densities  of  the  top  three  ambiguous  layers.  Two  of  the 
four  subjects  were  presented  with  a  third  stereogram,  an  ambiguous  RDS‘s  of 
the  type  described  in  section  2.2,  where  the  density  of  dots  on  the  generating 
sparse  pattern  was  9%.  In  this  case  the  subjects  were  asked  to  match  the 
densities  of  two  ambiguous  layers.  All  the  subjects  said,  after  the  experiment 
was  over,  that  the  density  matching  of  the  ambiguous  layers  was  more  diffi¬ 
cult  than  the  density  matching  of  the  three  transparent  layers  in  the  training 
session. 

Results: 

Table  1  gives  the  data  of  the  four  subjects  that  were  able  to  learn  to  do  the 
density  matching  accurately  enough,  to  within  1%  error. 


3  layers  (density  9%) 

3  layers  (density  11%) 

2  layers  (density  9%) 

top 

middle 

bottom 

top 

middle 

bottom 

top 

bottom 

5% 

6% 

5% 

5% 

6% 

9% 

6% 

6% 

2% 

5% 

8% 

6% 

6% 

7% 

4% 

9% 

5% 

4% 

6% 

6% 

3% 

8% 

- 

- 

5% 

0% 

7% 

0% 

5% 

6% 

- 

- 

Table  1:  First  two  major  columns  give  the  density  of  dots  on  each  of  the 
lop  three  ambiguous  layers:  top,  middle  and  bottom,  for  the  two  stereograms 
described  in  the  text,  reported  by  four  subjects.  The  last  major  column  gives 
the  density  of  dots  on  each  of  two  ambiguous  layers:  top  and  bottom,  for  the 
third  case  described  in  the  text,  reported  by  two  subjects 
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Conclusions: 

The  hypothesis  that  each  dot  in  the  generating  pattern  is  assigned  a  unique 
disparity  predicts  that  the  density  matching  results  approach  an  average  den¬ 
sity  of  4.5%  per  layer  in  the  first  stereogram,  and  an  average  density  of  5.5% 
per  layer  in  the  second  stereogram.  The  hypothesis  that  each  dot  in  the  gener¬ 
ating  pattern  is  assigned  multiple  disparities  predicts  that  the  results  approach 
an  average  density  of  9%  per  layer  in  the  first  experiment,  and  an  average  den¬ 
sity  of  11%  per  layer  in  the  second  experiment.  In  practice,  the  more  accurate 
subjects  (the  first  three  rows  in  table  1)  assigned  an  average  density  of  5.1% 
per  layer  in  the  first  experiment,  and  an  average  density  of  6.2%  per  layer 
in  the  second  experiment.  These  average  densities  are  somewhat  larger  than 
the  density  predicted  by  the  unique  matching  hypothesis,  but  much  smaller 
than  the  average  density  predicted  by  the  multiple  matching  hypothesis3.  Note 
that  the  subjects  did  not  have  to  report  the  density  of  the  fourth  layer,  which 
most  subjects  found  difficult  to  see  simultaneously  with  the  other  three  layers. 
Consequently,  the  average  reported  density  can  be  expected  to  be  higher  than 
predicted. 

The  results  with  the  third  stereogram  are  interesting  since  when  presented 
with  the  micropattern  of  this  stereogram,  people’s  perception  corresponds  to 
multiple  matching  of  the  dots  (section  2.2).  The  average  density  reported  by 
subjects  in  this  case,  6.25%,  is  larger  than  before  (5.1%),  but  still  intermediate 
between  the  prediction  of  the  hypothesis  of  multiple  matching  at  each  dot  (9%) 
and  the  prediction  of  the  hypothesis  of  unique  matching  (4.5%),  closer  to  the 
later.  This  suggests  that  either  this  experiment  does  not  measure  correctly  the 
number  of  dots  that  are  matched  at  each  disparity,  or  that  there  is  a  difference 
between  the  matching  of  isolated  features  and  the  matching  of  images  with 
texture. 

The  results  of  the  density  matching  experiment,  if  indeed  this  experiment 
correctly  measures  the  number  of  dots  matched  at  each  disparity,  seem  to 
support  the  hypothesis  that  a  unique  disparity  is  assigned  to  each  dot  of  the 
generating  sparse  pattern  of  a  doubly  ambiguous  stereogram,  where  some  dots 
are  assigned  one  disparity  and  some  another  disparity. 

3I  should  note,  however,  that  the  results  in  a  somewhat  different  experiment,  where 
the  subjects  had  to  match  the  density  on  each  transparent  layer  separately,  were  more 
ambiguous;  in  this  experiment,  the  densities  assigned  to  each  layer  were  close  to  the  average 
between  the  prediction  of  the  multiple  matching  hypothesis  and  the  prediction  of  the  unique 
matching  hypothesis.  These  results  are  not  included  in  this  paper  since  the  task  was  harder 
for  the  subjects  and  the  results  were  less  reliable. 
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3.2  Experiment  2:  the  lowest  density  of  the  transpar¬ 
ent  layers 

When  looking  at  stereograms  with  transparent  layers,  ambiguous  or  not  am 
biguous,  subjects  reported  seeing  points  floating  in  a  range  of  depth  values. 
Subjects  were  asked  to  report  a  layer  when  they  subjectively  perceived  a  layer. 
This  could  be  a  difficult  decision  for  them  in  some  cases.  The  present  experi¬ 
ment  was  designed  to  identify  the  lowest  density  of  dots  at  a  given  disparity, 
above  which  these  subjects  subjectively  decide  that  there  exists  a  transparent 
layer  at  this  disparity. 

Methods: 

Four  subjects  participated  in  this  experiment.  They  were  presented  with  eight 
stereograms  of  three  transparent  layers.  The  density  of  two  main  layers  was 
always  9%  (namely,  9%  of  tnc  pix  is  were  black).  The  density  of  the  third  layer, 
either  the  top  or  the  bottom  layer,  was  lower  and  variable.  The  number  of  dots 
in  this  layer,  measured  as  a  fraction  of  the  number  of  dots  in  the  stereogram 
altogether,  was  4%,  6%,  8%  or  10%  ( where  33%  means  that  all  layers  are  of 
the  same  density).  The  subjects  were  asked  to  report  how  many  transparent 
layers  they  subjectively  perceived  as  layers,  and  put  a  cursor,  whose  depth 
they  could  change,  on  each  of  the  layers  they  identified.  The  last  procedure 
was  required  to  verify  which  layers  they  has  actually  seen  and  how  accurate 
their  judgement  was. 

Results: 

Four  subjects  participated  in  this  experiment.  The  subjects  always  identified 
the  main  dense  layers.  Table  2  shows  for  which  conditions  each  subject  also 
identified  the  sparse  layer.  Two  subjects  (the  first  two  rows  in  table  2)  were 
fairly  accurate  in  their  depth  judgement  of  the  two  main  layers,  whereas  the 
other  two  subjects  were  less  accurate  (deviating  by  more  than  a  pixel  from  the 
actual  depth  of  a  dense  layer). 

All  four  subjects  : adged  the  depth  of  the  main  layers  more  accurately  than 
the  sparse  layer.  One  subject  identified  a  layer  at  an  inter  mediate  depth  value 
between  the  two  dense  layers  when  the  density  of  the  sparse  layer  was  6% 
(namely,  too  low  for  this  subject  to  perceive  the  sparse  layer  as  a  distinct 
layer,  but  large  enough  to  indicate  to  her  that  something  was  going  on). 
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sparse  top  layer 

sparse  bottom  layer 

6% 

8% 

10% 

no 

yes 

no 

no 

no 

yes 

yes 

no 

yes 

yes 

no 

no 

yes 

yes 

no 

yes 

yes 

no 

no 

no 

no 

no 

no 

yes 

no 

no 

no 

no 

Table  2:  Answers  whether  a  subject  identified  the  sparse  layer  for  a  given 
condition.  Two  subjects  (whose  data  is  shown  in  the  first  two  rows)  were  more 
accurate  than  the  other  two  subjects. 


Conclusions: 

The  results  of  the  four  subjects  participating  in  this  experiment,  especially  the 
two  more  accurate  subjects,  were  fairly  consistent.  They  subjectively  perceived 
a  distinct  layer  at  a  given  disparity  when  more  than  7%  of  the  dots  in  the  image 
could  be  matched  at  that  disparity. 


3.3 


Experiment  3: 
biguous  RDS 


the  micropattern  of  a  doubly  am- 


In  the  double  nail  illusion  expe-'ment  [4],  in  which  a  configuration  similar  to 
the  micropattern  of  a  doubly  ambiguous  stereogram  (section  2.3)  was  used, 
Krol  &  van  de  Grind  reported  that  subjects  selected  only  those  disparities 
corresponding  to  the  full  circles  in  figure  lc.  However,  in  all  their  experiments 
J>  was  almost  identical  to  Gi .  In  the  present  experiment,  subjects  were  asked 
to  match  two-dot  patterns,  as  in  figure  lc,  but  with  Gr  7^  Gi .  The  conditions 
of  the  experiment  described  in  section  2.3  were  repeated,  where  the  isolated 
micropatterns  were  presented  to  the  subjects  instead  of  the  ambiguous  stere¬ 
ograms.  Two  subjects  participated  in  this  experiment.  In  agreement  with  [4], 
both  subjects  selected  only  those  disparities  corresponding  to  the  full  circles 
in  figure  lc  in  all  cases. 


4  Computational  discussion 

The  first  experment  described  in  section  3.1  suggests  that  the  multiple  match¬ 
ing  effect,  discussed  in  section  2.3,  can  possibly  be  explained  by  an  algorithm 
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that  selects  at  each  feature  a  unique  disparity.  In  this  section,  two  such  algo¬ 
rithms  are  discussed.  These  algorithms  were  selected  for  their  simplicity  and 
as  representatives  of  two  different  matching  principles;  it  is  not  suggested  here 
that  they  are  biologically  plausible  and  it  is  not  assumed  that  they  can  deal 
with  noisy  real  images. 

The  first  is  a  patch-wise  correlation  algorithm  (e.g.  [5,  2]),  in  which  the 
disparity  selected  at  each  feature  is  the  disparity  that  maximizes  the  correlation 
between  a  patch  around  the  feature  in  one  image  and  a  corresponding  displaced 
patch  in  the  second  image. 

The  second  algorithm,  Prazdny’s  stereo  matching  algorithm  [8],  identifies 
and  matches  features  in  both  images,  using  a  measure  closely  related  to  the 
disparity  gradient  defined  in  [1]  to  enforce  smooth  matching.  Disparity  gradi¬ 
ent  is  defined  for  two  features  in  one  image,  each  assigned  a  specific  disparity: 
it  is  the  disparity  difference  between  the  two  features  divided  by  the  distance 
(averaged  over  both  images)  between  the  features.  In  Prazdny’s  algorithm, 
at  a  given  feature  in  the  image,  each  disparity  that  corresponds  to  a  feasible 
match  receives  decreasing  support  from  its  neighbors  (within  a  certain  neigh¬ 
borhood)  with  increasing  disparity  gradient.  The  most  supported  disparity  is 
selected  at  each  feature.  Thus,  since  the  algorithm  uses  the  disparity  gradient 
to  evaluate  the  quality  of  a  particular  match,  larger  differences  in  disparity  are 
tolerated  for  features  that  are  further  apart. 

A  third  algorithm,  PMF  [7],  is  also  discussed;  not  because  it  represent  a 
different  matching  principle,  but  because  it  has  been  argued  by  its  authors  [6] 
that  this  algorithm  can  explain  human  perception  in  the  experiments  discussed 
in  section  2.3.  Similarly  to  Prazdny’s  matching  algorithm,  the  PMF  algorithm 
uses  the  disparity  gradient  between  two  features  to  enforce  smoothness  of  the 
disparity  field.  In  this  algorithm,  each  candidate  match  (disparity)  at  a  given 
feature  accumulates  support  from  neighboring  features  before  the  selection 
of  the  best  (or  most  supported)  match.  This  support  is  given  only  if  the 
disparity  gradient  between  the  two  neighbors  is  smaller  than  a  certain  limit, 
the  disparity  gradient  limit.  The  use  of  this  particular  smoothing  method  was 
based  on  psychophysical  evidence  [1]  that  simultaneous  stereo  fusion  of  two 
features  is  possible  only  if  the  disparity  gradient  between  them  is  smaller  than 
1. 

Pollard  &:  Frisby  have  previously  argued  that  a  disparity  gradient  limit  of 
1  should  be  the  limit  in  their  algorithm  when  used  to  model  human  stereo 
vision  [7].  In  order  to  explain  human  perception  in  the  experiments  discussed 
in  section  2.3,  they  changed  this  limit,  arbitrarily  setting  it  to  0.5.  As  a 
result,  the  modified  PMF  algorithm  accounted  for  the  experiments  described 
in  section  2.3.  Unfortunately,  this  change  of  the  threshold  value,  which  was 
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only  noted  in  a  figure  caption  in  [6],  resulted  in  a  failure  of  the  PMF  algorithm 
to  account  for  some  other  well-known  psychophysical  results  (a  more  detailed 
discussion  is  given  in  section  4.1). 

Rather  than  solving  the  problem,  Pollard  &  Frisby’s  letter  [6]  demonstrated 
the  difficulty  encountered  by  stereo  matching  algorithms  when  dealing  with 
doubly  ambiguous  random  dot  stereograms.  A  particular  selection  of  parame¬ 
ters  can  make  it  possible  for  the  algorithm  to  explain  human  perception  in  some 
cases,  but  the  same  parameters  are  unsuitable  to  explain  human  perception  in 
other  cases.  In  the  rest  of  this  section,  using  the  three  algorithms  mentioned 
above,  the  questions  of  whether  these  algorithms  can  be  modified  to  explain 
human  perception,  and  how  narrow  the  tuning  range  of  their  parameters  is  if 
indeed  they  can,  are  studied. 


4.1  First  difficulty:  different  perception  for  micropat 
terns  and  RDS’s 


right  image 


R, 

•-2-# 


left  image 


Figure  4:  A  stereogram  of  two  nails  as  in  the  double  nail  illusion  experiment 
[4].  Two  nails  are  seen  in  both  images.  The  right  image  is  shown  above  the  left 
image  for  purpose  of  illustration.  The  separation  between  the  nails  is  2  pixels 
in  the  right  image  and  4  in  the  left,  where  each  pixel  corresponds  to  roughly 
1.2  minutes  of  arc  as  in  [9]. 

The  example  shown  in  figure  4  is  a  simple  ambiguous  configuration,  similar 
to  the  one  used  in  the  double  nail  illusion  experiment  [4].  This  example  is 
the  micropattern  of  the  experiment  described  in  section  2.3  (namely,  the  stere¬ 
ogram  in  that  experiment  is  made  of  a  random  distribution  of  such  patterns). 
There  are  four  possible  matchings  of  the  two  nails  in  the  left  image  (L j,  L3 )  to 
the  nails  in  the  right  image  (Ri,R2).  Table  3  gives  the  disparity  gradient  be¬ 
tween  L i  and  L3  for  each  of  these  matchings.  The  disparity  gradient  is  defined 
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to  be  [1]  the  difference  between  the  disparities  assigned  to  the  two  features 
divided  by  the  average  distance  (in  the  two  images)  between  the  two  features. 


match 

disparity 

gradient 

Ri 

0 

Ri 

-4 

2 

Ri 

0 

Ri 

-2 

2 

3 

Ri 

2 

Ri 

-4 

2 

r2 

2 

Ri 

-2 

2 

Table  3:  Four  possible  pairings  of  nails  L\  and  Z2  in  the  left  image  to  nails  Rx 
and  R2  in  the  right  image  are  listed.  The  disparity  gradient  is  calculated  for 
each.  The  complete  derivations  of  the  disparity  gradient,  for  the  four  possible 


matchings  respectively,  are  =  2, 


: L21 

(4+2)/2 


2 

3  ’ 


*-(-«)  =  2 
(4+2J/2  ’ 


(4+  0)/2  ~ 


It  is  clear  from  Table  3  that  a  disparity  gradient  limit  of  0.5  is  smaller  than 
the  disparity  gradient  between  any  possible  pairing  of  L\  and  L2.  Therefore  no 
pairing  can  support  the  other.  Thus  the  PMF  algorithm  with  disparity  gradi¬ 
ent  limit  of  0.5,  when  matching  this  stereogram,  is  equally  likely  to  detect  any 
pairing  without  any  preference.  However,  the  results  of  experiment  3  reported 
in  section  3.3  show  that  humans,  when  presented  with  this  stereogram,  always 
see  a  single  matching,  Lx  with  Rx  and  L2  with  f?2. 

This  example  is  not  an  accident.  In  fact,  no  disparity  gradient  limit  ex¬ 
ists  with  which  PMF  can  explain  all  these  related  experiments.  The  range  of 
disparity  gradient  limits  for  which  the  PMF  algorithm  can  explain  the  exper¬ 
iments  described  in  section  2.3  is  identical  to  the  range  of  disparity  gradient 
limits  for  which  it  fails  to  explain  the  experiments  described  in  section  3.3. 
The  difficulty  follows  from  one  of  the  points  discussed  in  [9],  namely,  humans’ 
response  to  isolated  micropatterns  (such  as  figure  4)  appears  to  be  different 
from  their  response  to  the  stereograms  discussed  in  section  2.3.  The  PMF 
algorithm,  on  the  other  hand,  responds  to  both  type  of  stimuli  in  a  similar 
manner. 

The  PMF  algorithm  fails  because  it  uses  a  fixed  threshold  (the  disparity 
gradient  limit).  Favoring  low  disparity  gradient  in  a  gradual  way  leads  to 
better  results.  Pra/dny’s  stereo  matching  algorithm,  which  uses  the  disparity 
gradient  to  give  support  in  a  gradual  way  rather  than  thresholding  it,  can 
explain  simultaneously  the  response  to  the  isolated  micropattern  and  to  the 
stereogram  for  the  same  range  of  parameter-.  On  the  other  hand,  the  correla¬ 
tion  based  algorithm  performs  similarly  to  PMF  with  disparity  gradient  limit 
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of  0.5,  namely,  it  fails.  However,  since  a  correlation- based  algorithm  is  not 
designed  for  the  matching  of  isolated  features,  this  failure  is  not  surprising. 

4.2  Second  difficulty:  different  stereograms  with  the 
same  correlation 

Another  problem  arising  from  the  experiments  discussed  in  section  2  concerns 
RDS’s  with  identical  correlation  functions.  In  such  stereograms  (described  in 
section  2.5),  when  large  regions  (the  regions  including  the  doubled  generating 
pattern)  in  the  two  images  are  correlated  with  each  other,  the  resulting  graphs 
look  very  similar  (figures  3a, b),  yet  subjects  perceive  a  different  number  of 
transparent  layers  in  each.  Another  example  (with  different  parameters)  is 
given  in  figures  5a, b,  where  subjects  perceive  up  to  four  layers  and  only  two 
layers  respectively,  when  presented  with  these  stereograms  that  have  exactly 
the  same  correlation  function. 

This  problem  does  not  concern  only  the  patch- wise  correlation  algorithm. 
The  correlation  between  the  left  and  right  images  is  a  good  measure  of  the 
kind  of  interaction  and  disparity  support  neighboring  features  provide  to  the 
matching  at  a  certain  feature.  All  successful  stereo  matching  algorithms  require 
such  interactions  and  use  support  from  neighboring  features  in  one  form  or 
another  to  select  a  disparity  at  a  given  point.  Thus  the  fact  that  humans 
perceive  a  different  number  of  layers  in  stereograms  where  the  neighborhood 
interactions  seem  similar  poses  a  difficulty  to  any  stereo  matching  algorithm. 

In  order  to  compare  the  output  of  stereo  matching  algorithms  to  humans, 
a  postprocessing  stage  was  added  to  all  of  them,  in  which  the  number  of 
transparent  layers  was  decided.  In  the  following,  the  number  of  dots  assigned 
(uniquely)  to  each  disparity,  summed  over  the  whole  image,  was  compared  to 
a  threshold  to  determine  whether  a  transparent  layer  should  be  reported  at 
that  disparity4.  The  value  of  the  threshold  parameter,  along  with  the  values 
of  other  parameters  of  each  algorithm,  were  varied  to  determine  whether  there 
exists  a  tuning  of  the  algorithm,  a  particular  set  of  parameters,  for  which  the 
algorithm  can  explain  human  perception.  The  sensitivity  of  the  algorithms  to 
any  particular  tuning  was  also  studied. 

4This  postprocessing  stage  mimics  humans’  subjective  decision  of  whether  they  see  a 
transparent  layer  or  only  isolated  points  at  a  particular  disparity.  I  should  note  here  that 
people  seem  to  have  difficulty  with  identifying  more  than  three  layers  in  stereograms  that 
have  four  “simple”  transparent  layers.  No  attempt  is  made  here  to  mimic  this  constraint. 
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The  correlation  with  variable  window  sizes 

The  correlation  function  at  a  point  in  the  right  image  computed  for 

a  disparity  £?,  using  a  correlation  window  of  size  W,  is  defined  as  follows: 

w  w 

correlation  =  £  /(zf  >  vf  )Hxi  +  Vj)  (!) 

*  3 

(where  I(x,y)  is  the  image  intensity  at  point  (z,t/).) 

To  study  the  interactions  between  neighboring  points  in  the  stereograms 
corresponding  to  figures  5a, b,  the  correlation  function  was  recomputed  using 
different  correlation  window  sizes  W.  When  small  windows  were  used,  in 
particular  when  the  window  was  smaller  than  5x5  pixels  and  Gr  and  Gi  ranged 
between  2  and  4  pixels,  the  correlation  function  for  stereograms  corresponding 
to  figure  5a  showed  a  different  distribution  when  compared  to  the  correlation 
function  for  stereograms  corresponding  to  figure  5b.  Two  examples  of  the 
correlation  function  for  the  stereogram  corresponding  to  figure  5a  are  shown 
in  figures  5c, e.  Two  examples  of  the  correlation  function  for  the  stereogram 
corresponding  to  figure  5b  are  shown  in  figures  5d,f. 

This  discussion  suggests  that  the  algorithms  discussed  here,  Prazdny’s 
matching  algorithm  and  a  patch- wise  correlation  maximization  algorithm,  should 
be  restricted  to  small  regions  of  interaction  with  neighboring  dots  in  order  to 
replicate  human  perception.  In  the  next  section,  where  these  algorithms  are 
studied  in  detail,  the  size  of  the  interaction  neighborhood  is  one  of  the  param¬ 
eters  studied. 

Simulations 

In  the  following  simulations,  a  simple  implementation  of  Prazdny’s  stereo 
matching  algorithm  and  a  patch-wise  correlation  maximization  matching  algo¬ 
rithm  were  used,  with  an  added  postprocessing  stage  as  discussed  above.  The 
algorithms  were  tested  on  the  following  ten  cases: 

1.  A  doubly  ambiguous  RDS,  with  Gr  =  4,  G\  —  2.  The  algorithm  was 
expected  to  report  at  least  three  layers  at  disparities  —2,0, 2,4. 

2.  A  doubly  ambiguous  RDS,  with  Gr  =  2,  Gi  =  2.  The  algorithm  was 
expected  to  report  a  single  layer  at  disparity  2. 

3.  A  doubly  ambiguous  RDS,  with  Gr  =  2,  Gi  =  0  (section  2.2).  The 
algorithm  was  expected  to  report  two  layers  at  disparities  —2,0. 
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4.  A  doubly  ambiguous  RDS,  with  Gr  —  4,  Gi  =  2,  and  with  additional 
points  at  disparity  2.  The  algorithm  was  expected  to  report  at  least 
three  layers  at  disparities  —  2, 0,2,4. 

5.  A  doubly  ambiguous  RDS,  with  Gr  =  2,  Gi  =  2,  and  with  additional 
point  at  disparity  —2.  The  algorithm  was  expected  to  report  two  layers 
at  disparities  —2,2. 

6.  A  doubly  ambiguous  RDS,  with  Gr  =  4,  Gi  =  2,  and  with  additional 
points  at  disparities  0,2.  The  algorithm  was  expected  to  report  at  least 
three  layers  at  disparities  —2,  0,2,  4. 

7.  An  RDS  as  described  in  section  3.2,  where  the  sparse  layer  includes  4% 
of  the  image  points.  The  algorithm  was  expected  to  report  two  layers  at 
disparities  2,4. 

8.  An  RDS  as  described  in  section  3.2,  where  the  sparse  layer  includes  6% 
of  the  image  points.  The  algorithm  was  expected  to  report  two  layers  at 
disparities  2, 4. 

9.  An  RDS  as  described  in  section  3.2,  where  the  sparse  layer  includes  8% 
of  the  image  points.  The  algorithm  was  expected  to  report  three  layers 
at  disparities  0, 2,  4. 

10.  An  RDS  as  described  in  section  3.2,  where  the  sparse  layer  includes  10% 
of  the  image  points.  The  algorithm  was  expected  to  report  three  layers 
at  disparities  0,2,4. 

These  cases  were  chosen  as  a  representative  subset  of  the  stereograms  used 
in  the  psychophysical  experiments  described  in  sections  2-3,  including  all  the 
stereograms  that  may  pose  difficulty  to  a  matching  algorithm  for  one  of  the 
reasons  described  above.  In  particular,  the  stereograms  in  cases  4  and  5  have 
the  same  correlation  function,  given  in  figures  5a, b. 

The  simulations  were  repeated  for  5  different  data  sets  produced  randomly, 
and  a  few  times  for  each  data  set  to  determine  consistency5.  The  performance 
of  both  algorithms  was  fairly  consistent,  though  Prazdny’s  algorithm  showed 
slightly  higher  variability.  The  parameters  that  were  varied  for  both  algorithms 
were  the  window  of  interaction,  from  5x5  pixels  to  15x15,  and  the  threshold 
on  the  minimal  density  of  points  that  elicit  the  impression  of  a  distinct  layer, 

*When  more  than  one  disparity  was  given  the  maximal  support  at  a  point,  one  dispar¬ 
ity  was  selected  at  random  in  the  implementation  of  the  two  algorithms,  and  therefore  a 
consistency  check  was  necessary. 
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from  2%  to  10%.  A  normalization  parameter  in  Prazdny’s  matching  algorithm 
was  also  varied. 


Results: 

Case  9  seemed  to  be  a  limit  case,  in  which  subjects  could  identify  the  sparse 
layer  less  reliably.  This  case  was  therefore  discarded  from  the  initial  perfor¬ 
mance  evaluation  of  the  algorithms.  It  was  considered  in  a  subsequent  analysis 
of  the  patch-wise  correlation  algorithm  as  discussed  below. 

Table  4  summarizes  the  results  for  the  two  algorithms,  five  test  cases,  and 
two  to  three  repetitions  of  each  case.  The  result  in  each  case  is  the  set  of 
pairs,  the  window  size  in  pixels  and  the  threshold  value  in  percents,  for  which 
the  algorithm  succeeded.  Prazdny’s  algorithm  was  considered  successful  for 
a  particular  window  size  and  threshold  value  if  there  existed  a  normalization 
coefficient  with  which  these  two  parameters  produced  a  successful  result. 


Prazdny’s  algorithm 

patch-wise  correlation 

1st  test 

2nd  test 

3rd  test 

1st  test 

2nd  test 

3rd  test 

/ 

(5,5%)  (7,2%)  (7,3%) 

(5,5%)  (7,2%) 

(5,4%)  (5,5%)  (7,2%) 

(5,4%) 

(5,4%) 

/ 

/ 

f 

(5,5%) 

NA 

(5,4%) 

(5,4%) 

NA 

/ 

NA 

/ 

(7,2%) 

NA 

/ 

/ 

NA 

/ 

/ 

NA 

Table  4:  Summary  of  the  results  of  the  simulations  discussed  in  the  text.  Each 
row  summarizes  a  different  data  set.  Each  algorithm  is  assigned  three  columns, 
for  three  separate  tests  of  the  algorithm  on  the  same  data.  When  the  algorithm 
was  tested  only  twice  on  a  given  data  set,  NA  appears  in  its  third  column.  / 
stands  for  a  failure  of  the  algorithm,  otherwise  the  list  of  parameters  for  which 
the  algorithm  succeeded  is  given. 

When  an  algorithm  was  successful  for  a  particular  set  of  parameters,  typ¬ 
ically  the  density  of  dots  assigned  to  each  disparity  matched  human  perfor¬ 
mance  in  experiment  1  (section  3.1). 

The  correlation  algorithm  selects  at  each  feature  the  disparity  d  that  max¬ 
imizes  the  correlation  of  a  patch  around  the  feature  and  a  patch  displaced 
by  d  in  the  second  image.  However,  the  disparity  selected  in  this  way  may 
not  correspond  to  a  feasible  match,  there  need  not  be  a  corresponding  feature 
displaced  by  d  in  the  second  image.  An  improved  version  of  the  correlation 
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algorithm  was  also  simulated,  where  the  disparity  with  the  highest  correlation 
value,  among  the  disparities  corresponding  to  feasible  matches,  was  selected  at 
each  feature.  This  algorithm  was  more  successful,  in  particular  when  dealing 
with  low  density  transparent  layers  (cases  7-10).  It  was  tested  twice  on  each 
of  four  data  sets  (corresponding  to  the  last  four  rows  in  table  4). 

The  results  of  the  improved  patch-wise  correlation  algorithm  are  given  in 
table  5.  The  performance  of  this  algorithm  was  robust  enough  to  handle  the 
limit  case  9.  These  results  show  one  set  of  parameters,  (7,4%),  for  which  the 
improved  correlation  algorithm  succeeded  in  every  trial,  for  all  test  cases.  It 
almost  always  succeeded  for  the  sets  of  parameters  (9,3%),  (7,4%)  and  (5,5%). 


improved  patcl 

1st  test 

l-wise  correlation 

2nd  test 

(9,3%)  (7,4%)  (7,3%)  (5,5%) 
(9,4%)  (9,3%)  (7,4%)  (7,3%)  (5,5%) 
(7,5%)  (7,4%) 

(9,3%)  (7,4%)  (5,5%) 

(11,3%)  (11,2%)  (9,3%)  (7,4%)  (5,5%) 
(9,3%)  (7,4%)  (5,7%)  (5,5%) 

(9,3%)  (7,5%)  (7,4%)  (5,5%) 

(9,4%)  (9,3%)  (7,4%) 

Table  5:  Summary  of  the  results  of  the  simulations  for  the  improved  patch- wise 
correlation  algorithm  tested  on  four  data  sets,  corresponding  to  the  last  four 
rows  in  table  4. 


Discussion 

The  results  of  these  simulations  show  that  neither  algorithm  performs  consis¬ 
tently  with  human  perception  all  the  time.  This  should  not  be  considered  as 
a  major  problem  since  the  results  some  subjects  reported  also  varied  in  time. 
Both  algorithms  agreed  with  humans  for  a  rather  small  window  of  interaction, 
5x5  pixels  for  Prazdny’s  algorithm  and  from  5x5  to  7x7  for  the  correlation 
algorithm.  The  results  of  the  patch-wise  correlation  algorithm  seem  to  be  con¬ 
sistent  with  humans’  more  often,  and  for  a  wider  range  of  parameters.  More¬ 
over,  an  improved  version  of  the  correlation  algorithm  proved  to  be  consistent 
with  human  behavior  all  the  time,  for  a  wide  range  of  parameters.  It  should 
also  be  noted  that  this  algorithm  is  much  faster  and  simpler  to  implement. 
However,  it  is  not  appropriate  for  the  matching  of  single  features,  as  discussed 
in  section  4.1. 
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5  Summary 

I  have  discussed  old  and  new  experiments  with  doubly  ambiguous  random  dot 
stereograms.  In  these  stereograms  there  is  often  no  single  “correct”  matching 
of  the  left  and  right  images,  a  few  different  solutions  to  the  matching  problem 
are  conceivable.  Humans  select  a  particular  solution.  Their  performance  in 
these  tasks,  which  has  been  described  in  this  paper,  can  be  used  to  evaluate 
stereo  matching  algorithms,  identifying  those  that  are  more  appropriate  as 
models  of  human  stereo  vision. 

Three  simple  stereo  matching  algorithms,  representing  two  different  match¬ 
ing  approaches,  were  discussed  in  section  4.  One  algorithm,  PMF,  failed 
to  explain  the  difference  in  ambiguity  resolution  between  the  random  dot 
stereograms  and  the  micropatterns  of  the  stereograms  presented  in  isolation. 
Prazdny’s  stereo  matching  algorithm  could  explain  this  difficulty,  but  its  tun¬ 
ing  to  explain  the  other  experimental  results  proved  to  be  hard.  The  patch-wise 
correlation  maximization  algorithm  could  be  easily  tuned  to  agree  with  human 
perception,  requiring  a  small  correlation  window,  though  it  is  not  suitable  for 
the  matching  of  isolated  features.  These  results,  and  the  conclusions  of  exper¬ 
iment  1  (section  3.1),  support  the  idea  that  the  matching  of  isolated  features 
may  involve  different  processes  than  the  matching  of  random  dot  stereograms 
(cf.  [3]). 
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